Model Selection

INT8 quantization

# INT8 quantization

Mistral Small 3.1 24B Instruct 2503 Quantized.w8a8

This is an INT8-quantized Mistral-Small-3.1-24B-Instruct-2503 model, optimized by Red Hat and Neural Magic, suitable for fast response and low-latency scenarios.

Safetensors Supports Multiple Languages

Mistral Small 24B Instruct 2501 Quantized.w8a8

A Mistral instruction fine-tuned model with 24B parameters after INT8 quantization, significantly reducing GPU memory requirements and improving computational throughput.

Large Language Model

Safetensors Supports Multiple Languages

Deepseek R1 Distill Qwen 32B Quantized.w8a8

INT8 quantized version of DeepSeek-R1-Distill-Qwen-32B, reducing VRAM usage and improving computational efficiency through weight and activation quantization.

Large Language Model

Qwen2.5 7B Instruct Quantized.w8a8

INT8 quantized version of Qwen2.5-7B-Instruct, suitable for multilingual scenarios in both commercial and research applications, optimized for memory requirements and computational throughput.

Large Language Model

Safetensors English

BAAI Bge M3 Int8

The ONNX INT8 quantized version of BAAI/bge-m3, suitable for dense retrieval tasks, and optimizes compatibility with Vespa Embedding.

Vit Base Patch16 224 Int8 Static Inc

This is an INT8 PyTorch model statically quantized using Intel® Neural Compressor post-training, based on Google's ViT model fine-tuning, significantly reducing model size while maintaining high accuracy.

Image Classification

Ibert Roberta Large

I-BERT is a pure integer-quantized version of RoBERTa-large, using INT8 to store parameters and integer operations for inference, achieving up to 4x inference acceleration.

Large Language Model

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase